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Abstract 

In the National Basketball Association (NBA), teams must make choices about 
which players to acquire, how much to pay them, and other decisions that are fun- 
damentally dependent on player effectiveness. Thus, there is great interest in quan- 
titatively understanding the impact of each player. In this paper we develop a new 
penalized regression model for the NBA, use cross-validation to select its tuning pa- 
rameters, and then use it to produce ratings of player ability. We then apply the model 
to the 2010-2011 NBA season to predict the outcome of games. We compare the per- 
formance of our procedure to other known regression techniques for this problem, and 
demonstrate empirically that our model produces substantially better predictions. We 
evaluate the performance of our procedure against the Las Vegas gambling lines, and 
show that with a sufficiently large number of games to train on our model outperforms 
those lines. Finally, we demonstrate how the technique developed in this paper can be 
used to quantitively identify "overrated" players who are less impactful than common 
wisdom might suggest. 

Keywords: Basketball; Penalized Regression; Ridge Regression; Lasso; Convex Pro- 
gramming 
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1 Introduction 

The National Basketball Association (NBA) is a multi-billion dollar business. Each of the 
thirty franchises in the NBA try their best to put forward the most competitive team possible 
within their budget. To accomplish this goal, a key task is to understand how good players 
are. 

A large fraction of the thirty NBA teams have quantitative groups analyzing data to 
evaluate and rate players. The website ESPN.com has many analysts providing statistical 
analysis for casual fans. Gambling houses use quantitative analysis to price bets on games, 
while gamblers try to use quantitative analysis to find attractive wagers. 

A popular technique for producing player ratings is weighted least-squares (LS) regres- 
sioiJE However, as we show later show, least squares is an approach with many flaws. 

In this paper, we introduce a new penalized regression technique for estimating player 
ratings which we call Subspace Prior Regression (henceforth, SPR). SPR corrects some of 
the flaws of least squares for this problem setting, and has substantially better out-of-sample 
predictive performance. Furthermore, given sufficient training data SPR outperforms the 
Las Vegas wagering lines. 

We interpret the ratings produced by SPR, discussing it identifies as the best players in 
the NBA (Section 16.11) . who are the most overrated and underrated players (Section 16. 2[) . 
and what SPR suggests is the relative importance of different basic actions within the game 

1 this technique is also known as Adjusted Plus/Minus (APM) in the quantitative basketball community. 
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like three point shooting and turnovers (Section 16.3 j) . Finally, we discuss some possible 
improvements to this model (Section [7]). 



2 Notation 

We use the notation R+ to indicate the set {x e R d |xj > Vz}. Let l n G R nxl denote the 
column vector of ones and e R nxl signify the i th standard basis vector. We use l p G MP xp 
to denote the identity matrix of size p, and Diag(w) to stand for a diagonal matrix with 
entries given by the vector w. Given a, b e R n and c £ R™ we define the inner product 

as a T b := Y^i=i a ibii the l v norm ||a|| p := E" =1 «f]^ and finally the c-weighted £ p norm as 
l|a|| P ,c := Er=i c * a f]" • 



3 A Brief Introduction to the Game of Basketball 

Each of the thirty teams in the NBA plays 82 games in a season, where 41 of these games 
are at their home arena and 41 are played away. Thus, there are 1,230 total games in an 
NBA regular season. Each team has a roster of roughly twelve to fifteen players. Games are 
usually 48 minutes long, and each of the two competing teams has exactly five players on 
the floor at a time. Thus, there are ten players on the floor for the duration of the game. 
Associated with each game is a box score, which records the statistics of the players who 
played in that game. Figure [1] contains a sample box score from an NBA game played on 
February 2nd, 2011 by the Dallas Mavericks (the home team) against the New York Knicks 
(the away team). Note that we only display the box score for the Mavericks players. Observe 
that there are 12 players listed in the box score, but only 11 who actually played for the 
Mavericks in this game. Each of the columns of this box score corresponds to a basic statistic 
of interest (the column REB in the box score denotes rebounds, AST denotes steals, etc.) 



3.1 Statistical Modeling of Basketball 



To statistically model the NBA, we must first extract from each game a dataset suitable 
for quantitative ana lysis. There is a standard proced ure for this currently used by many 



basketball analysts (IKubatko et al.l . 120071 ; lOliverl . I2004J ) , which we describe as follows 



We model each basketball game as a sequence of n distinct events between two teams. 
During event i the home team scores Yi more points than the away team. We use the variable 
p to denote the total number of players in the league (in a typical NBA season, p « 450.) 
We can then represent the current players on the floor for event i with a vector Xj G R p 
defined as 
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Figure 1: Sample single-game boxscore for the Dallas Mavericks 
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Associated with event i is a weighting factor ti^. Roughly speaking, the i th event happens 
for Wi minutes. 

Figure Q] contains a sample box score. We summarize box score data like that of Figure 
CD with the matrix RMavericks Game i which looks like 



t> Mavericks 
n Game #1 



MIN FGM FGA 

Brian Cardinal / 10 1 1 

Dirk Nowitzki 33 10 16 

Peja Stojakovic \ 



PTS 

3 \ 

29 

J 



This matrix records the statistics of the 12 players on the Dallas Mavericks roster for that 
particular game. If there are d basic statistics of interest in this box score, then R-Game #i ^ 
a matrix of size 12 by d. 

One can imagine computing the aggregate box score matrix 
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that summarizes the total statistics of these 12 players for an entire season. Finally, 
define the p x d matrix R that vertically concatenates Rj across the 30 teams in the NBA: 



R 



Team 1 
Team 2 



Team 30 



/R 



Mavericks ' 



R 



Bulls 



Celtics 



/ 



R summarizes the season box score statistics for all p players who played in the NBA for 
that year. 



3.2 Least Squares Estimation 

We want to determine the relationship between Xj and Y$, i.e., find a function / such that 
Yj « /(Xj). One natural way to do this is through a linear regression model, which assumes 
that 



Yi = a* hca + Xf/3* + e u i = 1, 2, . . . n. 

Recall that the event i has a weighting factor Wi associated with it. Roughly speaking, 
event i happens for Wi minutes. 

The scalar variable a£ ca represents a home court advantage term, while the variable 
(3* G MP is interpreted as the number of points each of the p players in the league "produces" 
per minute. This model recognizes players for whom their team is more effective because of 
their presence on the floor. 

For notational convenience, we stack the variables Yj, Wi, and into the n vectors Y, 
W, and E and the variables Xj into the n x p matrix X. This yields the matrix expression 

Y = l n < ca + X/3* + E. (1) 

Given observations (Y, X) and weights W, we define the W-weighted quadratic loss 
function as 

-^quadratic (Ohca, /3) := ||Y — l„ahca ~ (2) 

A natural technique for estimating the variables a£ ca and (3* is to minimize (EJ), i.e., 

A LS 

(Ahca)^ ) = ar S min -^quadratic («hca,/3), (3) 

A LS 

resulting in a weighted least squares (LS) problem. The values (3 are known in the 
quantitative basketball community as the adjusted plus/minus rating J^. The website Bas- 

ketballvalue^] has computed /3 for several recent seasons. 

^http : //www. 82gajnes . com/ilardil .htm 
http : // www . basketballvalue . com 
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3.3 Is least squares regression a good estimator of player value? 



Table 1: LS Player Ratings 

- LS 

Rank Player (3 i 
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James, LeBron 
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2 


Durant, Kevin 
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Nash, Steve 


11.39 
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Paul, Chris 
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Nowitzki, Dirk 


10.33 
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Collison, Nick 


9.87 
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Wade, Dwyane 


9.59 
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Hilario, Nene 


8.56 
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Deng, Luol 


8.46 


10 


Howard, D wight 


8.13 



Table H] lists the top ten players in the NBA for the combined 2009-2010 and 2010-2011 
NBA regular seasons by their ratings produced from least squares^. By this ranking, LeBron 

- LS 

James was the best player in the league over this two year period. Since /3 LeBron Jamcs = 12.62, 
this procedure suggests that he is worth an additional 12.62 net points to his team for every 
100 possessions the team plays. 

How believable are the player ratings of Table [Tp The list has many of the widely- 
considered best players in the NBA. However, there are also some names on this list that 
are questionable. If we believe these ratings, then Nick Collison, a player considered by 
most fans and analysts to be at best a merely average player at his position, is better than 
Dywane Wade and Dwight Howard, two of the premiere superstars in the league. Similarly, 
while Nene Hilario and Luol Deng are good players, they are not considered by most fans 
and analysts to be amongst the top ten players in the NBA. 

This contradiction between common wisdom and least squares is useful, since it can either 
reveal to us that the common wisdom is wrong or that the least squares approach is incorrect. 
We need some basis of comparison to evaluate how well least squares is performing. 

In classical linear regression, assuming that the generative model satisfies certain con- 
ditions, the least squares estimate has several desirable properties (maximum likelihood 
estimate, best linear unbiased estimate, consistency, asymptotic normality, etc). However, 
these properties typically assume that the underlying model satisfies certain technical condi- 
tions like normality, linearity, and statistical independence. It is unreasonable to expect that 
these technical conditions hold for the game of basketball. Thus, we must find other ways 

to evaluate how trustworthy the f3 values are, and whether they should be believed over 
common wisdom about players. One simple approach for evaluating the the least squares 
model is to test its predictive power versus a simple dummy estimator. 



4 These numbers were obtained from http: //basketballvalue . com/topplayers .php?&year=2010-2011 
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To do this, we 

1. define a dummy estimator that sets ^ Dummy — o for each player, and the home court 
advantage term /3° c " mmy = 3.5. In other words, each player is rated a zero, and the 
home team is predicted to win every 100 possessions by 3.5 points. 

2. We then can compute both the least squares estimate and dummy estimate for the first 
820 games of an NBA season, and measure how well each technique does in estimating 
the margin of victory of the home team for the remaining 410 games of that season. 

If least squares accurately models the NBA, then at a minimum it must substantially 
outperform the dummy estimator. Let us use the variable to denote the actual number 
of points by which the home team wins game k, to denote the predicted number of points 
by the statistical estimator of interest, and E^ := — A^ to denote the error this statistical 
estimator makes in predicting the outcome of game k. 

Figure |2] is a histogram of the error variable over the course of the 410 games under 
consideration from the 2010-2011 NBA season for each technique. A perfect estimator would 
have a spike of height 410 centered around zero. Thus, the "spikier" the histogram looks, 
the better a method performs. It is hard to immediately say from Figure |2] that the least 
squares estimate yields better predictions than the simple dummy estimate. We can also 
study some of the empirical properties of for each approach. Table |2] summarizes the 
results. 



Table 2: Performance of Statistical Estimators over the last 410 games 
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When comparing least squares to the dummy estimate, we notice that 

1. least squares reduces the percentage of games in which the wrong winner is identified 
from 39.27% to 33.66% over the block of 410 games of interest. 
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2. Unfortunately, the empirical behavior of seems to be substantially worse for least 
squares. For example, the empirical mean of \Ei\ is 18.05 for least squares, while only 
10.54 for the dummy estimator. Thus, least squares makes larger average errors when 
predicting the final margin of victory of games. 

As a result, it is hard to convincingly argue that least squares approach is a better model 
for the NBA than the dummy estimate. 
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Figure 2. Comparison of Dummy, Least Squares, Ridge Regression, SPR and SPR2 trained 
on 820 games. 



4 SPR: Improving Least Squares 

Although Figure |2] and Table |2] suggest that the LS estimate performs poorly, this doesn't 
necessarily mean that the linear model ([I]) is without promise. The least squares estimate 
simply doesn't take into account the following two key pieces of information we have about 
the problem domain: 

1. Model sparsity: The NBA is a game dominated by star players. Lesser players have 
far less impact on wins and losses. This folk wisdom informs player acquisitions and 
salaries. For example, with a $60 million budget, one would much rather acquire three 
elite $15 million stars and fill out the rest of the roster with cheap role-players, than 
spend tons of money on role-players and skimp on stars. 

This "elites first" strategy was used by the Boston Celtics in the summer of 2007 when 
they traded their role-players and other assets to build a team around Kevin Garnett, 
Paul Pierce and Ray Alleifl, and more recently by the Miami Heat in the summer of 

: http : //www.nba. com/celtics/news/press073107-garnett .html 
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2010 who built a team around LeBron James, Dywane Wade and Chris Boslfl We shall 
incorporate this prior information through l\ regularization. This penalizes non-sparse 
models, and should cause only the very best players to stand out in the regression. 
This suggests a penalty term of the form Ai||/3||i. 

2. Box score information: Another valuable piece of information useful in inferring player 
worth is the box score statistics matrix R. One expects good players to not only have 
high APM ratings, but to also produce rebounds, assists, blocks, steals, etc. Thus, 
we prefer ratings (3 which are consistent with box score statistics. In other words, we 
expect a ratings vector to be "close" to the column space of R. We therefore should 
penalize ratings for which the distance from (3 to Rz is large. Although there are many 
different possible penalties one can choose, in this work we choose a quadratic penalty 
term of the form A2 1 j/3 — zq1 p — Rzl^. 

— * 

We can encode the above prior information through the function g{a\ ica , (3, z , z; A) defined 

as 

g(«hca, (3, z Q , z; A) := L quadrati c(a hca , (3) + Ai| 1 1 + X 2 \\(3 - z l p - Rz||| , (4) 

Weighted least squares Sparse player ratings Box score prior 

1. R is a p x d matrix containing the box-score statistics of the p different players, 

2. The variable z gives us weights for each of the box score statistics, 

3. and the vector (Ai, A2) G are the regularization parameters. 

We shall use the shorthand A to denote the pair (Ai, A2). We can find a model consistent 
with both the data and the prior information by solving the convex optimization problem 

AL'/^^o^ = argrmn#(a nca ,/3,2o,z; A). (5) 

We call the procedure described by Equation ([5]) the SPR algorithm, and the vector f3 
are the player ratings produced by it. One very important difference between SPR and the 

least squares approach is that it yields both a player rating vector (3 and a box score weights 
vector z A . The weights vector z A is a valuable tool in its own right. It provides numerical 
values for different basic box score statistics like scoring, rebounding, and steals. We further 
interpret z A in Section lfT3"l 

Furthermore, it yields a linear formula for transforming player box score effectiveness 
into a player productivity rating through the equation 



6> 



Rz 



(6) 



http : //sports . espn . go . com/nba/news/story?id=5365165 
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6 X can be viewed as an additional player rating vector produced by SPR, one that linearly 
transforms each player's box score production into a "points per 100 posession" rating similar 

to least squares or (3 . Thus, 0* succintly converts the box score production of each player 
into a single number. 

Thus for player i, we can compare the variable /3j to the variable to understand 
how "overrated" or underrated he is relative to his box score production. This is useful, 
since many players produce great box score statistics but don't necessarily impact team 
competitiveness to the level the box score might suggest. We explore this aspect of SPR in 
further detail in Section 16^21 



4.1 Bayesian Interpretion of SPR 

SPR can be interpreted as the posterior mode for a Bayesian statistics model. Suppose that 
Y, «hca, A £o, z are all random variables. 



Let 






• Yi|a hca , (3 ~ A/"(a hca + Xf/3, §) 






• ahca have the improper prior P( 


a h 


» = a) oc 


• F((3\z ,z) oc e -^M\i-^\\f3-z i p - 


Rz 


|2 
2 

) 


• Zq have the improper prior W(z 




T) k !> 


• and z has the improper prior P 




= k) oc 1. 



Then the solution to SPR with w = l n is exactly the mode of the posterior distribution 
P(a hca ,A^o,z|Y). 



4.2 Selecting the regularization parameter A 



For SPR to b e useful, we need to be able to select a good choice of A quickly. Cross-validation 
( IStond . Il974j ) is one standard technique in statistics for doing this. To select regularization 
parameters, we use 10-fold cross-validation. We cross-validate over regularization parameters 
from the set 



A:= {(2 a ,2 6 )| a,beF} 



where 



F:={-10,-9,...,9}. 

X-fold cross-validation on T different values of A means solving TK different SPR prob- 
lems, each of which are convex programs of moderate size (n 20000, p 450, d ~ 20). 
Thus, it is necessary that 
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Table 3: Regularization parameters obtained from 10-fold cross-validation 

Setting Ai A2 



\820 
A CV 
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1. for each fixed value of A, SPR can be solved quickly 

2. and that many values of A can be evaluated at once. 

To address the first issue, we implemented a fast numerical algorithm for solving SPR 
for a fixed valued of A. See Appendix [A] for a derivation. 

To address the second issue, our cross-validation code takes advantage of the cloud com- 
puting service PiCloucQ to perform the computations in parallel. 

The resulting regularization parameters learned by cross-validation are summarized in 
Table El 



5 The Performance of SPR 

Our ultimate goal is to produce substantially better estimates of player value than least 
squares. If it turns out that despite all the additional computational work that SPR requires 
that there is little or no statistical improvement, then SPR is not of much practical value. 
In this section, we discuss the performance of SPR on the 2010-2011 NBA dataset. We 
demonstrate that SPR substantially outperforms both the dummy estimate and least squares 
estmate, and outperforms even Las Vegas given a sufficient amount of training data. 

5.1 SPR outperforms least squares 

From Table |3j we see that the cross-validation methodology described in Section 14.21 on the 
first 820 games of the 2010-2011 season yields the regularization parameter 

A^ = (2- 10 ,2- 3 ). 

Armed with this choice, we can now compare least squares to SPR on the final 410 games 
of the 2010-2011 NBA regular season. Each procedure produces a player rating vector f3, 
and we can use these ratings to predict the final margin of victory over this collection of 
games. 

Recall that we use the variable Ai to denote the number of points that a statistical 
estimator predicts that the home team will win game i, Ai to denote the actual number of 

'http : //www. pi cloud. com 
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points by which the home team wins game i, and = Ai — Ai to denote the difference 
between these quantities. 

Figure [2] is a histogram of the variable E^ for each technique. It is clear from Figure 
[2] that SPR produces better estimates than APM. The histogram of the SPR errors are 
"spikier" around the origin than the APM errors. We can also study some of the empirical 
properties of the variable E^ for each approach. Table [2] summarizes the results. As Table [2] 
indicates, SPR represents a substantial improvement on APM in nearly all of these statistical 
measures. In particular, 

• the fraction of games in which the wrong winner is guessed decreases from 33.66% with 
LS to 28.54% with SPR; and 

• the average absolute error in predicting the margin of victory decreases from 18.05 to 
10.554. 

Comparing SPR to the dummy estimator, we 

• see an enormous improvement in ability to predict the winning team. The percentage 
of games in which the wrong winner is predicted falls from 39.27% to 28.54%. 

• Both techniques obtain a similar average absolute error in predicting the margin of 
victory, with 10.54 for the dummy estimator and 10.55 with SPR. 

Overall, this suggests that SPR more accurately models the NBA than the least squares 
estimator. 

5.2 SPR outperforms Las Vegas 

To convincingly evaluate the performance of SPR, we examine whether it actually results in 
a profitable gambling strategy against the Vegas lines. In fact, we will compare the dummy, 
least squares and SPR estimators. Given predictions by each of the above estimates, we have 
the following natural gambling strategy: 

1. If the deviation A between the estimate's prediction of the outcome of a game and the 
Vegas lines is greater than 3, place a bet on the team the estimator favors. 

2. Otherwise, don't bet. 

Due to transaction costs that the sportbooking companies charged a gambling strategy 
must win more than roughly 52.5% of the time to at least break even. Table l5\2l summarizes 
the result of this gambling rule for each of the three techniques of interest over the last 
410 games of the 2010-2011 NBA season. The dummy-based gambling strategy places 263 
bets on the 410 games and loses 3 more bets than it wins, for a winning percentage below 
50%, which is performance comparable to random guessing, and not enough to break even. 

8 The fee is called the "vigorish" in the gambling community. 
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The least squares-based strategy has a winning percentage of 51.97% on 356 bets made. In 
comparison, SPR places wagers on 290 games and wins 57.24% of these bets. This represents 
a very profitable betting strategy, and thus suggests that SPR more accurately models the 
NBA than major alternatives, including the estimators used by Las Vegas. Finally, SPR 
obtains this improved performance while only having access to the first 820 games of the 
regular season. 

Table 4: Betting Strategy over the last 410 games, A = 3 

Statistic Dummy LS RR SPR SPR2 

# of bets possible 410.0000 410.0000 410.0000 410.0000 410.0000 

# of bets made 263.0000 356.0000 346.0000 290.0000 321.0000 
Net # of bets won -3.0000 14.0000 26.0000 42.0000 21.0000 
Winning percentage 0.4943 0.5197 0.5376 0.5724 0.5327 



5.3 Robustness of Results 

How sensitive is the SPR algorithm to our choice of training on the first 410 games? Does 
the performance relative to the least squares estimate degrade if the estimators are trained 
on much fewer games? To evaluate this, we train estimators on the first 410 games and 
then evaluate predictive power on the remaining 820 games. From Table EJ we obtain the 
cross-validation selected regularization parameter 



A^=(2- 7 ,2- 2 ). 

We also compare against the Las Vegas predictions for that block of 820 games. Table 
[5] summarizes the results of this experiment. As before, SPR outperforms both the Dummy 
estimator and LS. Furthermore, by increasing A to 5 (from the value 3 used when training 
on 820 games), SPR still leads to a successful betting strategy, as Table [6] shows. 

6 What does SPR say about the NBA? 

In the previous section, we evaluated the performance of SPR by testing its ability to predict 
the outcome of unseen games. In this section, we interpret the box score weights vector z A 

and player rating vector (3 returned by SPR, and discuss what they say about the NBA. 
6.1 Top 10 players in the league 

From /3 we can extract a list of the top 10 players in the league who have played at least 
10 possessions. Table [7] summarizes these results. This list contains some of the most 
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Table 5: Robustness Experiment, First 410 Games 



Metric 


Dummy 


LS 


RR 


SPR 


RMSE (millions) 


1087.4967 


1209.3042 


1130.7781 


1078.788. 


Fraction of games guessed wrong 


0.4024 


0.4073 


0.3732 


0.3049 


Mean of 


10.3326 


28.9714 


19.9875 


11.4719 


Variance of \Ei\ 


64.9664 


524.1851 


238.1755 


74.1957 


Median of \Ei\ 


9.2783 


23.3688 


17.3077 


9.5853 


Min of \Ei\ 


0.0279 


0.0491 


0.0158 


0.0208 


Max of \Ei\ 


49.2299 


150.4097 


98.2374 


43.1902 


Empirical P(|£;| > 1) 


0.9207 


0.9756 


0.9659 


0.9573 


Empirical P( \% \> 3) 


0.7768 


0.9329 


0.9000 


0.8378 


Empirical F(\E-\ > 5) 


0.6744 


0.8817 


0.8195 


0.7378 


Empirical F(\E-\ > 10) 


0.4720 


0.7805 


0.6890 


0.4780 



Table 6: Betting Strategy over the last 820 games, A = 5 

Statistic Dummy LS RR SPR 

# of bets possible 820.0000 820.0000 820.0000 820.0000 

# of bets made 342.0000 700.0000 658.0000 503.0000 
Net # of bets won -10.0000 6.0000 14.0000 57.0000 
Winning percentage 0.4854 0.5043 0.5106 0.5567 



Table 7. SPR Player 
Ratings 



Player 




James, LeBron 


8.6006 


Garnett, Kevin 


8.2860 


Paul, Chris 


8.1899 


Nowitzki, Dirk 


7.7296 


Howard, Dwight 


7.4706 


Gasol, Pau 


6.9251 


Odom, Lamar 


6.5630 


Hilario, Nene 


6.2170 


Evans, Jeremy 


6.1725 


Nash, Steve 


6.0349 



prominent star players in the league (LeBron James, Chris Paul, Dirk Nowitzki, Dwight 
Howard), thus agreeing with common basketball wisdom. However, this ranking contradicts 
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common basketball wisdom in the following ways: 

1. The list noticeably omits Kobe Bryant, a player pop culture and common basketball 
wisdom considers one of the league's superstars. Yet SPR thinks very highly of Pau 
Gasol and Lamar Odom, two of Kobe Bryant's teammates who are individually credited 
far less for the success of the Lakers than Kobe is. 

2. The list includes Nene Hilario and Jeremy Evans, players who are not considered by 
most to be amongst the top 10 players in the league. 

6.2 Top 10 most underrated and overrated players 

There are certain players in the NBA for whom their impact on the game seems to be far 
more (or less) than their raw box score production suggests. SPR allows us to identify these 
players and quantify their impact by measuring the discrepancy between their SPR rating 
and their weighted box score ratings X i. 
We define the underrated vector U as 

U := 0* - X . 

Similarly, we can examine which players impact the game much less than their box score 
production suggests with the vector O := — U. 

Table [9] lists the top 10 most underrated/overrated players in the league relative to their 
box score production. For at least a few of these players, it is easy to understand why box 
scores alone do a poor job of capturing their impact: 

• Andris Biedrens is a severe liability offensively, due to both his inability to score outside 
of 5 feet of the basket and poor free throw shooting. This makes it much more difficult 
for his teammates to score, since his defender can shift attention away from him and 
instead provide help elsewhere. Biedrens is also a liability defensively. 

• Goran Dragic is a point guard with a scoring mentality. While a "shoot-first" point 
guard is not necessarily harmful to a team, if he doesn't do a good enough job in 
setting up his teammates and creating easy scoring opportunities for them, it hurts his 
team's ability to score. 

6.3 Box score weights produced by SPR 

The SPR regression also produces box score weights z A that tell us the relative importance 
of the different box score statistics. z A gives us a method to linearly transform box score 
data into the player effectiveness rating 6 X defined in Equation |6j For player j, the variable 
6 X j is a weighted linear combination of his box score statistics. 
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We can examine each entry of the vector z to compare the relative importance of different 
box score variables like rebounds, assists and steals. Table E] summarizes the results. We 
also display the relevant row in the box score matrix R for LeBron James, which we call 

■^Lebron James - 

Examining this table, we see that LeBron James made two point shots at a rate of 7.83 
per 36 minutes, and attempted two point shots at a rate of 14.16 per 36 minutes. The 
corresponding weightings from z A > rescaled are 3 33 an( j — 1.54 respectively, suggesting that 
overall LeBron's rating from his two point shooting is 7.83 x 3.38 + 14.16 x —1.54 « 4.66 
points. In fact, from these weightings we can calculate that according to the SPR model all 
players in the league must hit their two point shots roughly 45% of the time for their rating 
from two point shooting to be non-negative. 

Interestingly enough, a similar calculation reveals that three point shots must only be 
hit at a roughly 14% rate to break even. This is counterintuitive: naively one would believe 
that hitting two point shots q percent of the time should be equivalent to hitting three point 
shots |g of the time. However, three point shooting increases the amount of spacing on the 
floor and perhaps missed three point shots are easier to rebound for the offensive team. 

According to this interpretation of the z A variable turnovers are extremely costly, with 
the corresponding entry of z A > rescaled equal to —.76. Thus, LeBron's turnover rate of 3.34 
turnovers per 36 minutes hurts his rating box score rating by roughly 6.28 points. 

7 Extending SPR by augmenting the box score 

In this section, we discuss a possible extension to the SPR model. 

The box score matrix R keeps track of statistics like rebounds, assists, and steals. How- 
ever, one might imagine augmenting this basic box score matrix with products of raw statis- 
tics such as rebounds x assists, blocks x steals, turnovers x free throws made, etc. By 
capturing some of these product statistics and incorporating them into SPR, one might 
more accurately model the value of multifaceted players. 

We expand the matrix R to include all pairwise product of the basic variables. If R is a 
p by d matrix, this leads to a p by d + (f) matrix called 

Poly (R, 2). 

Let us use the notations SPR(R) and SPR(R, 2) to denote SPR with the box score 
matrices R and Poly (R, 2), respectively. Applying the cross-validation procedure described 
in Section I4T21 on the first 820 games of the 2010-2011 produces the regularization parameter 

\*820 _ / o -10 n-l\ 
A CV,2 — \ A )■ 

With this choice of parameter for the expanded box score matrix Poly (R, 2), we can 
then empirically compare its performance to that of the ordinary SPR algorithm using the 
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Table 8: Box Score Weights 



Statistic 


Description 


z 


■^LcBron James 


oi\/r 
zlvl 


Per 


36 


Minute 


o.oo 


T CO 
/.OO 


ZA 


Per 


36 


Minute 


1 £ A 

-1.04 


1 /I 1 P. 

14. lb 


qt\/t 
olvl 


Per 


36 


Minute 


1 AQ 

1.4o 


l.Uo 


q A 

oA 


Per 


36 


Minute 


-U.zl 


Q OS 
O.ZO 


"PT^A ft 

r 1 M 


Per 


36 


Minute 


U. i 6 


o.y i 


T?T A 

r 1A 


Per 


36 


Minute 


-U.oo 


r . r y 


UK 


Per 


36 


Minute 


n 11 


U.y4 


JJK 


Per 


36 


Minute 


U.fctU 


£ on 

o.yy 


A Q 

Ab 


Per 


36 


Minute 


U.oO 


o.ol 


b 1 


Per 


36 


Minute 


1.00 


1.4b 


1 u 


Per 


36 


Minute 


-l.oo 


Q Q /I 

o.o4 


JdIY 


Per 


36 


Minute 


n q^ 
U.oO 


U.Oo 


PF 


Per 


36 


Minute 


-0.37 


1.92 


TC 


Per 


36 


Minute 


2.81 


0.09 


DQ 


Per 


36 


Minute 


6.98 


0.00 


PI 


Boolean 


-0.17 


0.00 


P2 


Boolean 


-0.71 


0.00 


P3 


Boolean 


0.29 


1.00 


P4 


Boolean 


1.65 


0.00 



basic box score matrix R. Figure [2] demonstrates the result of this experiment. From this 
figure we see that the additional box score statistics don't seem to substantially improve 
performance. The histogram of the SPR(R, 2) errors are fairly similar to the SPR(R) errors. 
We can also study some of the empirical properties of the variable Ek for each approach. 
Table |2] summarizes the results. 

As Table [2] indicates, SPR(R, 2) doesn't improve upon the predictive power of SPR(R). 
The fraction of games in which the wrong winner is guessed actually increases from 28.54% 
to 29.51%, the average absolute error in predicting games increases from 10.55 to 11.81. 

A possible explanation for this poor statistical performance is that the pairwise interac- 
tion terms that SPR(R, 2) models are too many, and thus the model is overfitting. 

8 Conclusion 

We have introduced SPR, a powerful new statistical inference procedure for the NBA. We 
compared the statistical performance of our approach to an existing popular technique based 
on least squares and demonstrate empirically that SPR gives more predictive power. We also 
compare SPR to the Las Vegas lines and show that with sufficient training data, SPR seems 
to better predict the NBA than Vegas. We interpret the estimates produced by SPR and 
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Table 9: Underrated/Overrated Players 



Player 


P 


Rz A + /3 A 1 P 


Underrat 


Dooling, Keyon 


1.3531 


-0.3840 


1.7371 


Watson, Earl 


0.9973 


-0.2920 


1.2893 


Aldridge, LaMarcus 


5.0226 


3.8456 


1.1770 


Ginobili, Manu 


5.4336 


4.2599 


1.1738 


Tolliver, Anthony 


1.9424 


0.7742 


1.1682 


Bosh, Chris 


4.6311 


3.4681 


1.1630 


Carter, Vince 


1.4315 


0.3376 


1.0939 


Collins, Jason 


-2.4071 


-3.4634 


1.0563 


Hill, George 


2.0535 


1.0201 


1.0333 


Bass, Brandon 


2.5777 


1.5792 


0.9985 



Player 


/3 A 


Rz A + /3 A l p 


Overrated 


Dragic, Goran 


-2.1439 


-0.4758 


-1.6680 


Marion, Shawn 


0.7049 


2.2120 


-1.5071 


Gortat, Marcin 


2.7433 


4.1681 


-1.4249 


Ellis, Monta 


-0.1175 


1.2634 


-1.3810 


Biedrins, Andris 


0.9205 


2.2408 


-1.3203 


Jefferson, Al 


1.7706 


3.0598 


-1.2893 


Felton, Raymond 


1.4721 


2.7513 


-1.2792 


Bell, Raja 


-2.8779 


-1.5992 


-1.2787 


Dudley, Jared 


1.1287 


2.3858 


-1.2572 


Law, Acie 


-1.7759 


-0.6247 


-1.1512 



discuss what they suggest about who the best players in the NBA are, and which players 
are overrated or underrated. Finally, we discuss a possible extension to the SPR model. 
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A The Cyclical Coordinate Descent Algorithm for SPR 



There are a variety of techniques for solvi ng the c onvex program ([5]) , including interior-point 
methods (IBovd and Vandenberghd . 120041 ). LARs (jEfron et all 12004 ). iteratively r e-weighted 



least squares ( iHuberl . Il974l ). ap proximating the L term with a smooth function (ILee et al. 



2006 ) the sub-gr adient method ( IShor et all Il985l ). and Nesterov's proximal gradient method 



flNesterovl . 12003 ). 



U ltimately, we found exp erimentally that cyclical coordinate descent (CCD) ([Friedman et al 



20071 ; IWu and Langd . 120081 ) was the fastest for our problem. 



The CCD method works by repeatedly optimizing the objective function viewed as a 
function of each variable with the others fixed. This idea gives a CCD algorithm for SPR, 
Algorithm [TJ 



Algorithm 1 CCDSPR(X, Y, R, A, T) 

1: «hca(0) <- 0, Zq{0) <r- 0, /3(0) <r- p , z(0) <r- d 

2: for i e {1,2,...,T} do 

3: {Optimize a^ca with all other variables fixed} 
4: {Optimize z with all other variables fixed} 
5: for k € {1,2, . . . ,p} do 

6: {Optimize (3 k with all other variables fixed} 
7: end for 

8: for t e {1,2,..., d} do 

9: {Optimize zg with all other variables fixed} 
10: end for 
11: end for 

12: return a hca (T), zq(T), /3(T), z(T) 



A.l Convergence of Algorithm [T] 

The correctness of this algorithm for minimizing the objective function (j4]) follows from 
Lemma IA.1I 

Lemma A.l. Let a hca *, Zq,(3*, E &rgmmg(a hca , (3, z , z; A). 
Then 

Hm g(a hca (T),z (T),P(T),z(T)) = g(a hca * , zj, (3* , z*). 

T->oo 

Furthermore, when ahca* , f3* , z* is the unique global minimum of g, 
lim (a hca (T),z (T),/3(T),z(T)) = (a hca * , z* , (3* , z*). 
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Proof. This is a direct consequence of Proposition 5.1 of Tseng] ( 200ll ). In particular, identify 
f and fi,i = l,...,p of Proposition 5.1 with L quadratic (a hca , (3) + A 2 ||/3 - z l p - Rz||| and 
Ai|/3j|, z = 1, . . . ,p, respectively. We observe that 



Assumption Bl of ITsengj (120011 ) is satisfied, since fo is continuous. 

Assumption B2 of ITsengj (120011 ) is satisfied, since / is convex and non-constant on line 
segments. 



Assumption B3 is satisfied, since 
Assumption C2 is trivially satisfied. 



1, . . . ,p are continuous. 



Therefore, the conditions of Proposition 5.1 of ITsengj ( 1200 ll ) are satisfied for Algorithm [T] on 
the objective function (J3J). 

Since @ has at least one global minimum and is convex, then we further conclude that 
the limit points of Algorithm [1] are global minima. 

□ 



A. 2 Computing the updates for Algorithm [T] 

The updates for Q!hca(*), Zo(i),f3(i), Z W can be computed in closed form. 

To compute ahca(*)> we can optimize the objective function g viewed as a function only 
of the decision variable «hca by taking the derivative and setting it to zero. 

This yields the update 

, l^DiagM [Y - Xfl 

Similarly, for zq(i) we get the update 

z ±- ~1 T \p-Rz]. 
p F 

For z, we simply get the least squares updates: 

z <- (R T R) _1 R T [y9 - z l p ]. 

A. 2.1 Updates for (3 

We next derive a closed-form expression for the updates for (3^ To do so, we need Lemma 
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Lemma A. 2 (One- variable lasso is soft-thresholding). Let h(x) := |Ar 2 — Bx + C + t\x\, x G 
KL Suppose that A > 0. T/ie solution of 

min ft,(x) (7) 

?',S' 



* Sr(B) 



where 

{0 i/ |x| < r 

x — t if x > and \x\ > t 

x + t if x < and |x| > r, 

is t/ie soft-thresholding function with threshold r. 
See Section IA.2.21 for a proof of this. 

This lemma is useful, as it allows us to immediately write the updates for (3 k (i). 
First, let us identify A and B for (3 k (i). Differentiating g s , we get 



d/3 t 9s = ^/3 t (-^quadratic (tthca,/3) + A 2 ||/3 - 2 l p - Rz 



where 



12T 



jU V^(^-« hca -xf/3) 2 



A 2 ^ t 5^ - z - RjV 2 



1> 



^ ^(li - a hca - Xf/3) 2 + A 2 ^ - ^ - R^) 2 

^ 2^iX it (-y, + a hca + Xf /3) + A 2 2(/3 t - ^ - R t T z) 



C £ Wi^it{-Yi + a h ca + Xf /3) + A 2 2(/3 t - z - R^) 

i 

C(Xe t ) T W(-Y + a hca l n + X/3) + \ 2 2((3 t -z - Rjz) 
C(Xe t ) T W(-Y + a hca l n + X[/3 - e t /3 t + e t /3 t ] + A 2 2(/3, - 9 t 
C(Xe t ) T W(K + X[etP t ]) + X 2 2(/3 t - 9 t ) 



C : 



6> 4 := (z Q l p + Rz) T e t , 

K ■= -Y + a hca l„ + X[/3 - e t /3 t ]. 
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The constant term (with respect to (3 t ) of the above expression is 

D := C(Xe t ) T WK - 2\ 2 6 t . 

The linear term is 

C(Xe t ) T WXe t /3 t + 2\ 2 /3 t = [CeJ X T WXe t + 2A 2 ] (3 t 

= E(3 t , 



where 



E := CeJX T WXe t + 2A 2 . 



From this we conclude that for (3 k (t) 



A = E 
B := -D. 



So, we have the update equation 



fc (i) <- 



A ' 



A.2.2 Proof of Lemma [A72l 



Proof. The sub differential of h(x) ( IRockafellarl . Il970f ) is the set 



A 



dh(x) := 2J a k (a k x - b k ) + r<9|x| 



fe=i 



— B + rd|x| 



where 



A* 



fc=i 

A 



d\x\ 



k=l 

{sign(a;)} if x 7^ 
[—1,1] otherwise. 



From the theory of convex analysis (IRockafellarl . 1 19 701 ) x* is the solution of (J7J) if and 
only if 

G <9/i(x*). (8) 
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The set dh(x*) behaves differently depending on the value of x*. When x* 7^ 0, then 

G dh(x*) = {x*A - B + rsign(x*)}, 
which is equivalent to x* = B ~ Thl ^ x \ However, when x* = 0, then 

G dh(x*) = {x*A -B + r[-l, 1]}. 
We use this observation to deal with the following two cases: 

1. Suppose that r > \B\. Then 

(a) If x* + 0, then 

t >\B\= \x*A + rsign(x*)| 
= x*A + r, 

since at least one cik ^ 0, then A > 0. This is a contradiction. Therefore x* ^ 
cannot be a solution when r > \B\. 

(b) If x* = 0, then 

G —B + r[-l, 1] = [-t-B,t-B], 

which is true. 

2. Suppose that r < \B\. Then 

(a) If x* 7^ 0, then S — rsign(x) has the same sign as B. Since A is positive, then 
x* has the same sign as B. So the choice of x* = B ~ T s ^ gn (~ B ) satisfies the required 
sub-gradient optimality condition (JSJ) without contradiction. 

(b) If x* = 0, then G — B + r[— 1, 1] = [— r — 5, r — B], which is a contradiction. 
Thus, 

1. T > |B| =^ x* = 0, 

2. r < |B| = 

These two cases can be summarized by desired. □ 
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